Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
نویسندگان
چکیده
This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi Coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library that incorporates the developments presented, and in general provides to heterogeneous architectures of multicore with coprocessors the DLA functionality of the popular LAPACK library. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology where we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware components by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.
منابع مشابه
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi Coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library that incorporates the developments presented here, and, more broadly, provid...
متن کاملExploring performance of Xeon Phi co-processor
The project aims to explore the performance of Intel Xeon Phi processor. We use various parallelisation and vectorisation methods to port a LU decomposition library to the coprocessor. The popularity of accelerators and co-processors is growing due to their good energy efficiency characteristics, and the large potential of further performance improvements. These two factors make co-processors s...
متن کاملImproving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach
Modern processor technologies have driven new designs and implementations in main-memory hash joins. Recently, Intel Many Integrated Core (MIC) co-processors (commonly known as Xeon Phi) embrace emerging x86 single-chip many-core techniques. Compared with contemporary multi-core CPUs, Xeon Phi has quite di↵erent architectural features: wider SIMD instructions, many cores and hardware contexts, ...
متن کاملBenchmarking Parallel Performance on Many-Core Processors
With the emergence of many-core processor architectures onto the HPC scene, concerns arise regarding the performance and productivity of numerous existing parallel-programming tools, models, and languages. As these devices begin augmenting conventional distributed cluster systems in an evolving age of heterogeneous supercomputing, proper evaluation and profiling of many-core processors must occ...
متن کاملPerformance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture
We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT on configurations with CPU ...
متن کامل